Data transfer method and system for loudspeakers in a digital sound reproduction system

ABSTRACT

The present publication describes a data transfer method and system in a digital sound reproduction system. The method comprises method steps for generating a digital audio stream for multiple channels in a host data source, e.g. a computer ( 1 ), the audio stream is formed by multiple consecutive samples, receiving the digital audio stream sent by the host data source ( 1 ) through a digital data transmission network by several digital receivers ( 2 ) each of which including a microcontroller with a clock, the receivers ( 2 ) further including means for generating an audio signal. In accordance with the invention the host data source ( 1 ) sends repeatedly a synchronization sample ( 60 ) to at least one receiver ( 2 ), the receiver ( 2 ) replies to the synchronization sample ( 60 ) by a return sample ( 61 ), the host ( 1 ) calculates a latency (T) for each receiver ( 2 ) based on the sending time (Th 1 ) of the synchronization sample ( 60 ) and the reception time (Th 2 ) of the return sample ( 61 ) and the processing time (Tt 1 −Tt 2 ) of the receiver ( 2 ), the host ( 1 ) sends to the receiver ( 2 ) information of the calculated latency (T) in combination with the time stamp the measurement time, based on this information the receiver ( 2 ) adjusts the function of its clock, and the above synchronization steps are repeated continuously.

The present invention relates to a data transfer method according to thepreamble of Claim 1.

The invention also relates to a data transfer system.

According to the prior art, there are several commercial system fordigital audio reproduction in digital networks. For example followingproducts are available today. The Gibson MaGIC™ network Cobra Net™,EtherSound™, Livewire™, MADI™ and others describe systems by which audiodata may be streamed to digital loudspeakers or sound reproductionsystems. Basically the quality of the reproduction in these systems isfairy good for home use but for professional use the digital transfertechnology causes some problems.

In accordance with the prior art the above problem has been solved bybuffering the information into receivers and controlling the unloadingof the information from the receivers.

In more detail, to synchronize clocks over Ethernet connections theexact travel time of network packets must be measured. This is difficultfor two reasons. First, standard network socket API will introducerandom latency between calling the user-mode send-function and theactual output of the packet depending on the status of the operatingsystem. The same applies also to reception of packets, the time betweenreception of packet from the network and its indication to user-modeprocess listening to the UDP socket cannot be accurately determined.

Secondly, when packet travels through network it will go through one ormore hubs, switches or routers. Each device may randomly delay packetsdepending on the load of network and state of the device. Thisintroduces random latency in travel time that cannot be predicted. Whenmeasured, it is found that the latency is nearly constant for most ofthe packets but some packets may be delayed by several hundreds ofmicroseconds or even more.

The invention is intended to eliminate some defects of the state of theart disclosed above and for this purpose create an entirely new type ofmethod and apparatus for data transfer in a sound reproduction system.

The invention is based on implementing network packet time stamping innetwork protocol stack so that accurate time for send and receipt ofpackets can be determined. In a preferred embodiment the receiversoftware implements the time stamping directly in the Ethernet driver(for which we have source code) for the most accurate operationpossible.

The second problem is preferably solved simply by running the clocksynchronization, which includes determination of round-trip time betweenhost and receiver, and performing the synchronization only if thelatency is within acceptable range from measured minimum latency.

More specifically, the method according to the invention ischaracterized by what is stated in the characterizing portion of Claim1.

The system according to the invention is, in turn, characterized by whatis stated in the characterizing portion of Claim 6.

Considerable advantages are gained with the aid of the invention.

The present invention is especially suitable for multi channel soundreproduction systems, where along the same data transfer path is sent adata stream including audio information of multiple audio channels to bereproduced simultaneously in several loudspeakers.

With the aid of the method according to the invention, a statisticallatency time may be defined in a start-up procedure and use this valueas a reference latency time for further, continuous latency measurement.

By these two methods the audio reproduction system may adapt to the loadof the network and make suitable adjustments in order to maintain highquality and synchronized multi-channel audio reproduction in most of theload variation cases.

In the following, the invention is examined with the aid of examples andwith reference to the accompanying drawings.

FIG. 1 shows a block diagram a digital audio system, which can be usedin connection with the present invention.

FIG. 2 shows as a block diagram one network management host system inaccordance with the invention.

FIG. 3 shows as a block diagram one receiver management system accordingto the invention.

FIG. 4 shows as a timing diagram a method in accordance with theinvention.

FIG. 6 shows as a flow chart a synchronization protocol in the receiverin accordance with the invention.

FIG. 7 shows as a flow chart a synchronization protocol in the host inaccordance with the invention.

In the invention, the following terminology is used in connection withthe reference numbers. However, the list is not exhaustive especiallyrelating to the block and flow diagrams of FIGS. 7-11:

-   1 host or host data source-   2 receiver, digital loudspeaker,-   2 a wireless receiver-   3 switch, network-   4 group of receivers-   10 hard disc-   12 virtual software audio adapter (driver)-   13 audio data manager-   14 synchronization manager-   15 network interface-   16 network timestamping-   17 system clock-   20 network interface-   22 timer hardware-   23 adjustable oscillator-   24 loudspeaker networks communications-   25 synchronization controller-   26 digital to analog conversion-   27 audio stream controller-   28 data output controller-   29 sample rate converter-   60 synchronization signal/ECHO REQ-   61 Return message/ECHO RESP-   62 Control Command/SET CLOCK-   150 Wireless Local Area Network (WLAN) access point

Also the following acronyms and abbreviations are used in the followingtext.

-   DHCP Dynamic Host Configuration Protocol-   FEC Forward Error Correction-   GLM Genelec Loudspeaker Manager-   Global LSNW Multicast address:port to which all global LSNW traffic    is sent. All address receivers listen to this address to receive    DISCOVERY, ANNOUNCE, GROUP and other global messages-   Group LSNW Multicast address:port to which all data directed to set    of grouped address receivers is sent. All receivers that are    assigned to same group listen to same group address. Group address    will receive clock synchronization messages, streamed audio and glm    control messages.-   Host Application that manages the loudspeaker network, streams audio    and send glm-control messages.-   IP Internet Protocol-   LSNW Loudspeaker Network-   Multicast A special IP address that will be routed to members of a    multicast address group.-   Receiver Processor, network interface and the software that connects    a loudspeaker to IP-network-   UDP User datagram protocol

Further, in this application latency means the network delay between twonetwork elements for a data sample.

In accordance with FIG. 1 the system in accordance with the inventioncomprises at least one host computer 1 or host data source forcontrolling the system and several receivers 1 connected to the hostcomputers 1 via en digital network 3 comprising the signal path 3 formedby cables, connectors, network adapters and switches etc.

In other words the LSNW (Loudspeaker network) system consist of one ormore hosts 1 that each manage sets of receiver devices 2. Hosts 1 act assource of management, control and audio data to the receivers 2. Hosts 1are responsible for discovering receivers 2 connected to IP-network,managing groups 4 of receivers and providing them with audio. Receivers2 respond to commands and playback audio data from hosts 1.

In accordance with FIG. 2 the host system comprises typically hard disc10 by which Digital audio data may be stored. Also some othernon-volatile medium like flash memory can be used. Digital audio datamay be acquired from virtual software audio adapter (driver) 12 thatredirects audio to networked loudspeakers. Audio data manager 13acquires digital audio data and makes it suitable for streaming.Streaming and synchronization manager 14 controls clock synchronizationof loudspeaker devices (receivers) currently controlled by the host.Network interface 15 connects the host to computer communicationsnetwork. Network timestamp-module 16 manages accurate timing ofsynchronization related network traffic. This is required to reduceeffects of random latencies introduced by the non real-time operatingsystem (such as Windows, Linux etc.) run by the host. System clock 17provides accurate time information used by the synchronization managerand a standard Ethernet network 3 enables IP-based communicationsbetween the host and the receivers.

Host application manages the loudspeaker network, routes managementinformation from GLM and audio from audio software to receivers. Hostapplication will run as a background daemon process on the hostcomputer. On windows platform, these background processes are usuallyreferred as services or system services.

Host provides interface for GLM software to send and receiveGLM-messages to receivers as if the GLM Software was using GLM network.

Host software will provide standard audio interface for audio softwareto send audio to LSNW receivers. Such interfaces are for example ASIOand Windows audio. The audio software will see LSNW receivers aschannels in virtual audio interface provided by the host.

Host will include proprietary kernel-mode driver software to providenecessary virtual audio interface and UDP Network interface 20 connectsthe receiver to communications network 3. Timer hardware 22 providestime information for the system clock and synchronization controller.Adjustable oscillator 23 provides clock signal for timer hardware andaudio data output controller 28. Loudspeaker networks communicationsmodule 24 manages network traffic to and from host computer.Synchronization controller 25 synchronizer receivers clock with host. Itadjusts clock oscillator in order to minimize clock drift betweenreceiver and host clocks. Digital signal processing, digital-to-analogconversion takes place in block 26. Audio stream controller 27 managesaudio data received from host and feeds it to audio data outputcontroller 28. Audio data output controller 28 outputs audio data atrate specified by adjustable oscillator 23. This guarantees that sampleswill be output at same rate as host outputs them. Sample rate converter29 converts digital audio to internal sample rate used by digital signalprocessing and digital-to-analog conversion.

Accurate clock synchronization is essential for correct working of theLSNW. The LSNW protocol has mechanism for clock synchronization thatenables synchronization of host and receiver clock within accuracy ofabout 10-20 microseconds.

The solution to the travel time (latency) measurement is to implementnetwork packet time stamping in network protocol stack so that accuratetime for send and receipt of packets can be determined. In windows hostsoftware the time stamping is implemented as an IP Packet Filter thatexamines incoming and outgoing UDP-packets and record time stamps ifpacket is destined to or originates from an LSNW receiver. This locationis not optimal for time stamping, as the time stamps should be collectedas near the network hardware as possible, but experience shows that timestamping at the IP Packet Filter lever gives good accuracy.

The receiver software in accordance with the invention implements thetime stamping directly in the Ethernet driver for the most accurateoperation possible. For this purpose a source code has been developed inconnection with the invention.

The problem of random variation of network latency can be solved simplyby running the clock synchronization, which includes determination ofround-trip time between host and receiver, and performing thesynchronization only if the latency is within acceptable range frommeasured minimum latency.

Clock synchronization is initiated by the host in accordance with FIG.4. The host 1 will synchronize clocks with each group member in around-robin fashion to guarantee all receivers have accurate time. Areceiver may send SYNCH REQUEST message to host if it feels a need toresynchronize its clock. This can happen for example if receiver mustinterrupt audio stream due to packet loss and continues it when audiopackets are received.

When a receiver 2 is assigned to a group, the host will send severalECHO REQ packets 60 to receiver to probe the roundtrip latency. Thereceiver 2 will reply with ECHO RESP 61 and the host 1 will thendetermine roundtrip latency Tt₁−Tt₂ for each transaction. Once theroundtrip latency Tt₁−Tt₂ is determined with adequate accuracy, the host1 will set the minimum acceptable roundtrip for successfulsynchronization. The latency will also change as the function of packetsize, so the latency is probed for packets of different sizes.

The actual roundtrip latency is measured as follows:

-   -   1. Send ECHO REQ 60 to receiver 2 (add extra payload to increase        packet size if necessary, receiver will not process the extra        payload as it is used only to change actual UDP datagram size to        determine latency for different packet sizes)    -   2. Get timestamp TSsend (Th₁) for the packet containing ECHO REQ        60 from timestamp driver    -   3. Receive ECHO RESP 61 from the receiver, it will contain        receiver ProcessingLatency, which is amount of microseconds        receiver spent between receipt of the ECHO REQ 60 and sending of        ECHO RESP 61    -   4. Get timestamp TSrecv (Th₂) for the ECHO RESP 61 packet.        Timestamp is formed by the Host 1    -   5. Roundtrip latency is TSrecv—TSsend—ProcessingLatency

Actual clock synchronization starts like the request—responsetransaction in initialization phase. Host sends an ECHO REQ 61 andreceiver replies with ECHO RESP 61.

ECHO RESP 61 packet contains two values, receivers clock at the time Tt₁of receipt of ECHO REQ 60 packet and ProcessingLatency, the time spentby receiver between receipt of ECHO REQ 60 and sending of ECHO RESP 61.

The host 1 will calculate the roundtrip latency as is initializationphase and if the latency is below the maximum acceptable valuedetermined in initialization, host sends the CLOCK SET message 62 toreceiver 2 that contains an estimate of hosts clock at the time receiverreceived the ECHO REQ 61 packet. The estimated time is calculated byadding half of the measured roundtrip time to time of outputting theECHO REQ 61 packet.

The protocol assumes that the network latency from host to receiver isequal to latency from receiver to host. This is usually the case, butthe roundtrip will become unsymmetrical when ECHO REQ is appended toaudio data as packet that contains ECHO REQ and audio data is muchlarger than the response packet that contains only ECHO RESP. Thisunsymmetry can be compensated by appending extra data to ECHO RESP tomake the response packet same size as the request. In real applications,the unsymmetry of network packet sizes does not have very large effecton the actual result of the synchronization. The effect of unsymmetricnetwork latency to offsets between host and receiver clocks can becalculated as follows (for simplicity, the calculation does not includeprocessing latency):

-   -   1. Host clock after synchronization will be        Th₁+L_(ht)+L_(th)+L_(ht) (=Host time at start+latency of ECHO        REQ+latency of ECHO REPLY+latency of CLOCK SET)    -   2. Receiver clock at the end of synchronization will be        Th₁+(L_(th)+L_(ht))/2+L_(th)+L_(ht) (SET CLOCK time+latency of        ECHO REPLY+latency of CLOCK SET)    -   The difference of clock will be        Th₁+L_(ht)+L_(th)+L_(ht)−(Th₁+(L_(th)+L_(ht))/2+L_(th)+L_(ht))=(L_(th)−L_(ht))/2

Further in more detail, in accordance with FIG. 5 host and receiverclocks have 2 second offset at host time 10.000000 s (Th₁). Synchprotocol packet latencies are 0.000160 s from host to receiver (Th₁−Tt₁)60, 0.000180 s from receiver to host 61 (Tt₂−Th₂). This will result in0.000010 s clock offset at the end of synchronization, assuming thatreceivers clock does not significantly drift from host clock betweentarget time 12.000160 and 12.000710. Receiver 2 may also correctfrequency of its clock based on the measured offsets and reduce averageerror between target and host clocks.

Since the network latencies (0.000160 s and 0.000180 s) were not equal,host and target clocks will have offset of(0.000180−0.000160)/2=0.000010 at the end of synchronization (Tt₃).

In accordance with FIG. 6 at start 90 receiver initializes hardware,possibly acquires IP address via DHCP and enters Idle state. In block 91the receiver receives SET GROUP command from host. The message containsIP address of multicast group to which all the loudspeaker group relatedtraffic is sent. The message also contains information on which channelof multi-channel audio the receiver is to output to digital-to-analogconversion. Receiver starts to listen to the multicast address. It alsosends message to host and acknowledges that the receiver has entered thegroup. Receiver enters state 92, RUNNING. At running state 92 receiverwill receive message directed to loudspeaker group multicast IP address.Audio data is entered into play queue and eventually output todigital-to-analog conversion. If receiver receives REQUEST TIMESTAMPmessage it enters state 97, SEND TIMESTAMP TO HOST. If receiver receivesSET CLOCK message it enters state 93. In block 93 validity of new clockvalue is determined based on current time, estimate of clock driftbetween host and receiver and time interval since last SET CLOCKmessage. If the new value appears invalid (due to large processinglatency in host or some other reason), receiver clock is not set andcontrol returns to state 92, RUNNING. If the new clock value appearsvalid, state 94, ADJUST OSCILLATOR, is entered. Control voltage toadjustable oscillator is set in block 94 based on the measured drift andbetween host and receiver clocks and the current control voltage. Inblock 95, if the measured clock offset between receiver and host is lessthan the duration of specified number of samples, state 92, RUNNING, isentered. In block 96, if the measured clock offset between receiver andhost is more than the duration specified number of samples, adjust clockvalue by multiple of sample durations. At the same time add or removesamples to/from the audio stream to compensate for the clock adjustment.After the adjustment, return to state 92, RUNNING. Further in block 97is sent TIMESTAMP message containing current receiver clock value andprocessing latency to host.

In accordance with FIG. 7, in block 100 host application is started. Itqueries network for available receiver loudspeakers and enters IDLEstate. In block 101 Host application receives command from userinterface to setup a receiver loudspeaker group. It starts analyzingnetwork latency to each loudspeaker. In block 102, if analysis was notsuccessful report error to user and return to IDLE state. Analysis maynot succeed for example if the packet loss in the network is too large.If analysis of network latencies to each receiver is successful storemaximal acceptable synchronization network latency for each receiver andenter state 103, RUNNING.

In running state 103 the system periodically synchronizes receiverloudspeaker clocks. In block 104 timestamp request is sent to receiver.If reply is not received within given period, the system returns torunning state and retries the synchronization. If the synchronizationfails several times consecutively, the system marks receiver loudspeakeras inactive and removes it from the group of active receivers. IfTIMESTAMP is received from receiver, system enters to state 105. Inblock 105 the system determines network latency for the synchronizationtransaction. If it is above the maximum acceptable synchronizationnetwork latency determined in 101, the system enters state 108. If thelatency is below acceptable maximum, the system enters to state 106. Instate 106 system sends SET CLOCK message to receiver. In block 107, iftime since last latency analysis is below given threshold value, thesystem enters to state 102, RUNNING. If time elapsed since last analysisis too large, the system reanalyzes network latency to receiver todetect if network latency has been permanently reduced by entering tostate 101. In block 108, if more than given number of consecutivesynchronization transactions have network latency larger than theacceptable maximum, the system performs latency analysis in order todetermine permanent growth of network latency.

According to one embodiment of the invention, the proposedsynchronization method can principally be utilized also in wirelessaudio applications, said, wireless loudspeaker systems.

Due to the lower transfer rate and delays introduced by the media accesscontrol of standard wireless networks, such as 802.11 a/b/g, the networklatencies are considerably larger than in wired 100 Mbps or 1000 MbpsEthernet networks. The synchronization protocol can adapt to thisincreased latency as it analyses networks behavior during the setupphase.

Standard wireless networks also introduce random latency in order toprevent collisions during packet transmissions. These random delays makethe synchronization in wireless networks more difficult that in Ethernetbased wired networks. The effects of said random delays can be reducedby selecting the acceptable maximum network latency using more strictpercentage value than when operating in wired networks. If percentage of30% is used instead of 90%, only transactions with less random delaywill be used for clock synchronization. This modification means thateach clock synchronization requires on average 3 ECHO REQUEST/ECHO REPLYtransactions before acceptable values are acquired for SET CLOCKcommand.

Wireless networks also typically have much larger packet loss thanEthernet—based wired networks due to radio interference and collisionsduring packet transmissions. To reduce the effects of packet loss aForward Error Correction (FEC)—encoding may be used to add redundancy intransmitted audio data. This redundancy may be used by receiver toreconstruct the audio packets lost by the network.

1-15. (canceled)
 16. A data transfer method in a digital soundreproduction system, comprising method steps for generating a digitalaudio stream for multiple channels in a host data source, e.g. acomputer, the audio stream is formed by multiple consecutive samples,receiving the digital audio stream sent by the host data source througha digital data transmission network by several digital receivers each ofwhich including a microcontroller with a clock, the receivers furtherincluding means for generating an audio signal, wherein the host datasource sends repeatedly a synchronization sample to at least onereceiver, the receiver replies to the synchronization sample by a returnsample, the host calculates a latency (T) for each receiver based on thesending time (Th₁) of the synchronization sample and the reception time(Th₂) of the return sample and the processing time (Tt₂−Tt₁) of thereceiver, the host sends to the receiver information of the calculatedlatency (T) in combination with the time stamp the measurement time,based on this information the receiver adjusts the function of itsclock, and the above synchronization steps are repeated continuously.17. Method according to claim 16, wherein the digital audio stream istransmitted wirelessly to the receiver.
 18. Method according to claim 16or 17, wherein the receiver compensates for the clock difference bysetting the local clock rate in order to obtain synch of themicrocontroller of the receiver.
 19. Method according to claim 16 or 17,wherein the host compares the calculated latency (T) with a referencelatency and if the calculated latency (T) is larger than the referencelatency, no adjustment information is sent to the receiver and the hoststarts a routine to redefine the reference latency.
 20. Method accordingto claim 19, wherein the clock difference is compensated for by addingor removing samples to/from the audio data stream and adjusting clockvalue accordingly.
 21. A data transfer system for a digital soundreproduction system, comprising a host data source, e.g., a computer forgenerating a digital audio stream for multiple channels, the audiostream is formed by multiple consecutive samples, a transmission pathfor the host data source, multiple digital receivers capable tocommunicate over the transmission path with the host data source, thereceivers including means for receiving the digital audio stream sent bythe host data source a microcontroller with a clock, and means forgenerating an audio signal, wherein the host computer has means forsending repeatedly a synchronization sample to at least one receiver,the receiver has means for replying to the synchronization sample by areturn sample, the host includes further means for calculating a latency(T) for each receiver based on the sending time (Th₁) of thesynchronization sample and the reception time (Th₂) of the return sampleand the processing time (Tt₂−Tt₁) of the receiver, sending to the eachreceiver information of the calculated latency (T) in combination withthe time stamp the measurement time, whereby based on this informationthe receiver includes means for adjusting the function of its clock, andthe system includes means for repeating the above synchronization stepscontinuously.
 22. The system according to claim 21, wherein it includesmeans for transmitting the digital audio stream wirelessly to thereceiver.
 23. The system according to claim 21 or 22, wherein thereceiver includes means for compensating for the clock difference bysetting the clock frequency of the microcontroller of the receiver. 24.The system according to claim 21 or 22, wherein the host includes meansfor comparing the calculated latency (T) with a reference latency and ifthe calculated latency (T) is larger than the reference latency, noadjustment information is sent to the receiver and the host starts aroutine to redefine the reference latency.
 25. The system according toany previous system claim 21 or 22, wherein the system includes meansfor compensating for the clock difference by adding or removing samplesto/from the audio data stream.
 26. A synchronization method in a digitalsound reproduction system, comprising method steps for generating adigital audio stream in a host data source, e.g. a computer, receivingthe digital audio stream sent by the host data source through a digitaldata transmission network by several digital receivers each of whichincluding a microcontroller with a clock, the receivers furtherincluding means for generating an audio signal, whereby the receiversare grouped in a predetermined manner, wherein the host data sourcesends repeatedly a synchronization sample to all receivers of a group,the receivers reply to the synchronization samples by return samples,the host calculates a latency times (T) for each sample and eachreceiver based on sending time (Th₁) of the synchronization sample andthe reception time (Th₂) of the return sample and the processing time(Tt₂−Tt₁) of the receiver, and based on the calculated latency times (T)the host forms statistically a reference latency value, below which timemost of the latency times are.
 27. Method according to claim 26, whereinthe digital audio stream is transmitted wirelessly to the receiver. 28.Method according to claim 26 or 27, wherein the reference latency is setsuch that at least 80% of the measured and calculated latency values arebelow the reference latency.
 29. Method according to claim 26 or 27,wherein the reference latency is set such that at least 50% of themeasured and calculated latency values are below the reference latency.30. Method according to claim 26 or 27, wherein it includes thefollowing steps:
 1. Sending ECHO REQ to receiver,
 2. Getting timestampTSsend for the packet containing ECHO REQ from timestamp driver, 3.Receiving ECHO RESP, whereby it will contain receiver ProcessingLatency,which is amount of microseconds receiver spent between receipt of theECHO REQ and sending of ECHO RESP
 4. Getting timestamp TSrecv for theECHO RESP packet
 5. Calculating Roundtrip latency isTSrecv—TSsend—ProcessingLatency